Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
We describe a modular design approach for creating versatile DNA origami subunits that can target diverse self-assembled structures. The subunit consists of a constant “core module” with variable “bond modules” and “angle modules” added to its exterior, controlling interaction specificity, strength, and structural geometry. The design features flexible joints between subunits, implemented by using single-stranded angle modules, whose mechanical properties and possible conformations are characterized by cryogenic electron microscopy and coarse-grained molecular modeling. We demonstrate the design’s versatility through the assembly of structures with different Gaussian curvature, including sheets, spherical shells, and tubes. Our findings suggest that incorporating a judicious amount of flexibility in the bonds provides error tolerance in design and fabrication while maintaining target fidelity. Furthermore, off-target assemblies potentially introduced by flexibility can be counterbalanced by increasing the number of distinct bonds. This approach enables precise targeting of specific structural binding angles across a broad range of configurations by eliminating unfavorable interactions.more » « less
-
Instruction tuning is critical for adapting large language models (LLMs) to downstream tasks, and recent studies have demonstrated that small amounts of human-curated data can outperform larger datasets, challenging traditional data scaling laws. While LLM-based data quality rating systems offer a cost-effective alternative to human annotation, they often suffer from inaccuracies and biases, even in powerful models like GPT-4. In this work, we introduce DS2, a Diversity-aware Score curation method for Data Selection. By systematically modeling error patterns through a score transition matrix, DS2 corrects LLM-based scores and promotes diversity in the selected data samples. Our approach shows that a curated subset (just 3.3% of the original dataset) outperforms full-scale datasets (300k samples) across various machine-alignment benchmarks, and matches or surpasses human-aligned datasets such as LIMA with the same sample size (1k samples). These findings challenge conventional data scaling assumptions, highlighting that redundant, low-quality samples can degrade performance and reaffirming that "more can be less."more » « less
An official website of the United States government

Full Text Available